Search CORE

20 research outputs found

Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes

Author: Bras Ronan Le
Choi Yejin
Lourie Nicholas
Publication venue
Publication date: 24/03/2021
Field of study

As AI systems become an increasing part of people's everyday lives, it becomes ever more important that they understand people's ethical norms. Motivated by descriptive ethics, a field of study that focuses on people's descriptive judgments rather than theoretical prescriptions on morality, we investigate a novel, data-driven approach to machine ethics. We introduce Scruples, the first large-scale dataset with 625,000 ethical judgments over 32,000 real-life anecdotes. Each anecdote recounts a complex ethical situation, often posing moral dilemmas, paired with a distribution of judgments contributed by the community members. Our dataset presents a major challenge to state-of-the-art neural language models, leaving significant room for improvement. However, when presented with simplified moral situations, the results are considerably more promising, suggesting that neural models can effectively learn simpler ethical building blocks. A key take-away of our empirical analysis is that norms are not always clean-cut; many situations are naturally divisive. We present a new method to estimate the best possible performance on such tasks with inherently diverse label distributions, and explore likelihood functions that separate intrinsic from model uncertainty.Comment: 18 pages, 14 tables, 18 figures. Accepted to AAAI 2021. For associated code and data, see https://github.com/allenai/scruple

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

Author: Bhagavatula Chandra
Bras Ronan Le
Choi Yejin
Lourie Nicholas
Publication venue
Publication date: 24/03/2021
Field of study

Commonsense AI has long been seen as a near impossible goal -- until recently. Now, research interest has sharply increased with an influx of new benchmarks and models. We propose two new ways to evaluate commonsense models, emphasizing their generality on new tasks and building on diverse, recently introduced benchmarks. First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets. Second, we propose a novel evaluation, the cost equivalent curve, that sheds new insight on how the choice of source datasets, pretrained language models, and transfer learning methods impacts performance and data efficiency. We perform extensive experiments -- over 200 experiments encompassing 4800 models -- and report multiple valuable and sometimes surprising findings, e.g., that transfer almost always leads to better or equivalent performance if following a particular recipe, that QA-based commonsense datasets transfer well with each other, while commonsense knowledge graphs do not, and that perhaps counter-intuitively, larger models benefit more from transfer than smaller ones. Last but not least, we introduce a new universal commonsense reasoning model, UNICORN, that establishes new state-of-the-art performance across 8 popular commonsense benchmarks, aNLI (87.3%), CosmosQA (91.8%), HellaSWAG (93.9%), PIQA (90.1%), SocialIQa (83.2%), WinoGrande (86.6%), CycIC (94.0%) and CommonsenseQA (79.3%).Comment: 27 pages, 19 figures, 34 tables. Accepted to AAAI 2021. For associated code and data see https://github.com/allenai/rainbo

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Author: Allaway Emily
Bhagavatula Chandra
Choi Yejin
LeBras Ronan
Lourie Nicholas
Rashkin Hannah
Roof Brendan
Sap Maarten
Smith Noah A.
Publication venue
Publication date: 07/02/2019
Field of study

We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 877k textual descriptions of inferential knowledge. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., "if X pays Y a compliment, then Y will likely return the compliment"). We propose nine if-then relation types to distinguish causes vs. effects, agents vs. themes, voluntary vs. involuntary events, and actions vs. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.Comment: AAAI 2019 C

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

Author: Bragg Jonathan
Choi Yejin
Kasai Jungo
Khashabi Daniel
Lourie Nicholas
Smith Noah A.
Stanovsky Gabriel
Weld Daniel S.
Publication venue
Publication date: 11/06/2021
Field of study

Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks that can be reliably evaluated in an automatic manner. This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms asking human annotators to evaluate them on various axes (e.g., correctness, conciseness, fluency) and compares their answers to various automatic metrics. We introduce several datasets in English to GENIE, representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. We provide formal granular evaluation metrics and identify areas for future research. We make GENIE publicly available and hope that it will spur progress in language generation models as well as their automatic and manual evaluation

arXiv.org e-Print Archive

Instrumental performance and results from testing of the BLAST-TNG receiver, submillimeter optics, and MKID arrays

Polarized thermal emission from interstellar dust grains can be used to map magnetic fields in star forming molecular clouds and the diffuse interstellar medium (ISM). The Balloon-borne Large Aperture Submillimeter Telescope for Polarimetry (BLASTPol) flew from Antarctica in 2010 and 2012 and produced degree-scale polarization maps of several nearby molecular clouds with arcminute resolution. The success of BLASTPol has motivated a next-generation instrument, BLAST-TNG, which will use more than 3000 linear polarization sensitive microwave kinetic inductance detectors (MKIDs) combined with a 2.5m diameter carbon fiber primary mirror to make diffraction-limited observations at 250, 350, and 500

\mu

m. With 16 times the mapping speed of BLASTPol, sub-arcminute resolution, and a longer flight time, BLAST-TNG will be able to examine nearby molecular clouds and the diffuse galactic dust polarization spectrum in unprecedented detail. The 250

\mu

m detector array has been integrated into the new cryogenic receiver, and is undergoing testing to establish the optical and polarization characteristics of the instrument. BLAST-TNG will demonstrate the effectiveness of kilo-pixel MKID arrays for applications in submillimeter astronomy. BLAST-TNG is scheduled to fly from Antarctica in December 2017 for 28 days and will be the first balloon-borne telescope to offer a quarter of the flight for "shared risk" observing by the community.Comment: Presented at SPIE Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy VIII, June 29th, 201

arXiv.org e-Print Archive

Crossref

Online Research @ Cardiff

Characterization, deployment, and in-flight performance of the BLAST-TNG cryogenic receiver

Author: Ade Peter A. R.
Ashton Peter C.
Austermann Jason E.
Coppi Gabriele
Cox Erin G.
Devlin Mark J.
Dober Bradley J.
Fanfani Valentina
Fissel Laura M.
Galitzki Nicholas
Gao Jiansong
Gordon Samuel
Groppi Christopher E.
Hilton Gene C.
Hubmayr Johannes
Klein Jeffrey
Li Dale
Lourie Nathan P.
Lowe Ian
Mani Hamdi
Mauskopf Philip
McKenney Christopher
Nati Federico
Novak Giles
Pisano Giampaolo
Romualdez L. Javier
Sinclair Adrian
Soler Juan D.
Tucker Carole
Ullom Joel
Vissers Michael
Wheeler Caleb
Williams Paul A.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2020
Field of study

The Next Generation Balloon-borne Large Aperture Submillimeter Telescope (BLAST-TNG) is a submillimeter polarimeter designed to map interstellar dust and galactic foregrounds at 250, 350, and 500 microns during a 24-day Antarctic flight. The BLAST-TNG detector arrays are comprised of 918, 469, and 272 MKID pixels, respectively. The pixels are formed from two orthogonally oriented, crossed, linear-polarization sensitive MKID antennae. The arrays are cooled to sub 300mK temperatures and stabilized via a closed cycle

^3

He sorption fridge in combination with a

^4

He vacuum pot. The detectors are read out through a combination of the second-generation Reconfigurable Open Architecture Computing Hardware (ROACH2) and custom RF electronics designed for BLAST-TNG. The firmware and software designed to readout and characterize these detectors was built from scratch by the BLAST team around these detectors, and has been adapted for use by other MKID instruments such as TolTEC and OLIMPO. We present an overview of these systems as well as in-depth methodology of the ground-based characterization and the measured in-flight performance.Comment: Presented at SPIE Millimeter, Submillimeter, and Far-Infrared Detectors and Instrumentation for Astronomy X, December 13-18, 202

arXiv.org e-Print Archive

Online Research @ Cardiff

Archivio della ricerca- Università di Roma La Sapienza